Protein expression is a subcomponent of gene expression. It consists of the stages after DNA has been translated into polypeptide chains, which are ultimately folded into proteins. Protein expression is commonly used by proteomics researchers to denote the measurement of the presence and abundance of one or more proteins in a particular cell or tissue.
Protein expression systems are very widely used in the life sciences, biotechnology and medicine. Molecular biology research uses an enormous number of proteins and enzymes many of which are from expression systems; particularly DNA polymerase for PCR, reverse transcriptase for RNA analysis and restriction endonucleases for cloning. There are also significant medical applications for expression systems, notably the production of human insulin to treat diabetes.
Contents |
Commonly used protein expression systems include those derived from bacteria,[1] yeast,[2] baculovirus/insect,[3] and mammalian cells.[4][5]
The oldest and most widely used expression systems are cell-based and may be defined as the "combination of an expression vector, its cloned DNA, and the host for the vector that provide a context to allow foreign gene function in a host cell, that is, produce proteins at a high level".[6][7] Expression is often done to a very high level and therefore referred to as overexpression.
There are many ways to introduce foreign DNA to a cell for expression, and there are many different host cells which may be used for expression - each expression system has distinct advantages and liabilities. Expression systems are normally referred to by the host and the DNA source or the delivery mechanism for the genetic material. For example, common hosts are bacteria (such as E.coli, B. subtilis), yeast (such as S.cerevisiae) or eukaryotic cell lines. Common DNA sources and delivery mechanisms are viruses (such as baculovirus, retrovirus, adenovirus), plasmids, artificial chromosomes and bacteriophage (such as lambda). The best expression system of choice depends on the gene involved, for example the Saccharomyces cerevisiae is often preferred for proteins that require significant posttranslational modification and Insect or mammal cell lines are used when human-like splicing of the mRNA is required. Nonetheless, bacterial expression has the advantage of easily producing large amounts of protein, which is required for X-ray crystallography or nuclear magnetic resonance experiments for structure determination.
E. coli is one of the most widely used expression hosts, and DNA is normally introduced in a plasmid expression vector. The techniques for overexpression in E. coli are well developed and work by increasing the number of copies of the gene or increasing the binding strength of the promoter region so assisting trancription.
For example a DNA sequence for a protein of interest could be cloned or subcloned into a high copy-number plasmid containing the lac promoter, which is then transformed into the bacterium Escherichia coli. Addition of IPTG (a lactose analog) activates the lac promoter and causes the bacteria to express the protein of interest.
Non-pathogenic species of the gram-positive Corynebacterium are used for the commercial production of various amino acids. The C. glutamicum species is widely used for producing glutamate and lysine,[8] components of human food, animal feed, and pharmaceutical products.
Expression of functionally active human epidermal growth factor has been done in C. glutamicum,[9] thus demonstrating a potential for industrial-scale production of human proteins. Expressed proteins can be targeted for secretion through either the general secretory pathway (Sec) or the twin-arginine translocation pathway (Tat).[10]
Unlike gram-negative bacteria, the gram-positive Corynebacterium lack lipopolysaccharides that function as antigenic endotoxins in humans.
Cell-free expression of proteins is possible using purified RNA polymerase, ribosomes, tRNA and ribonucleotides. These reagents may be produced by extraction from cells or from a cell-based expression system. Due to the low expression levels and high cost of cell-free systems cell-based systems are more widely used.